The Australian National Corpus: National Infrastructure for Language Resources

نویسندگان

  • Steve Cassidy
  • Michael Haugh
  • Pam Peters
  • Mark Fallu
چکیده

The Australian National Corpus has been established in an effort to make currently scattered and relatively inaccessible data available to researchers through an online portal. In contrast to other national corpora, it is conceptualised as a linked collection of many existing and future language resources representing language use in Australia, unified through common technical standards. This approach allows us to bootstrap a significant collection and add value to existing resources by providing a unified, online tool-set to support research in a number of disciplines. This paper provides an outline of the technical platform being developed to support the corpus and a brief overview of some of the collections that form part of the initial version of the Australian National Corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interoperable Annotation in the Australian National Corpus

The Australian National Corpus (AusNC) provides a technical infrastructure for collecting and publishing language resources representing Australian language use. As part of the project we have ingested a wide range of resource types into the system, bringing together the different meta-data and annotations into a single interoperable database. This paper describes the initial collections in Aus...

متن کامل

The AusNC Project: Plans, Progress and Implications for Language Technology

In the last eighteen months, a consensus has emerged from researchers in various disciplines that a vital piece of research infrastructure is lacking in Australia, namely, a substantial collection of computerised language data. A result of this consensus is an initiative aimed at the establishment of an Australian National Corpus. The progress of this initiative is presented in this paper, alon...

متن کامل

Towards the Design of the Australian National Corpus

Corpora are becoming more and more important as a research tool for linguists as they are large collections of authentic text. However, not every researcher has the time and resources to compile their own corpus. Large corpora in the world such as the BNC, the ANC or the International Corpus of English (ICE) have been widely used for research on the English language in general or an English dia...

متن کامل

A Bayesian model decision support system: dryland salinity management application

Addressing environmental management problems at catchment scales requires an integrated modelling approach, in which key bio-physical and socio-economic drivers, processes and impacts are all considered. Development of Decision Support Systems (DSSs) for environmental management is rapidly progressing. This paper describes the integration of physical, ecological, and socio-economic components i...

متن کامل

ORTOLANG an infrastructure for sharing of written and speech language resources (ORTOLANG : une infrastructure de mutualisation de ressources linguistiques écrites et orales) [in French]

Résumé. Nous proposons une démonstration de la Plateforme de l’Equipex ORTOLANG (Open Resources and Tools for LANGuage : www.ortolang.fr) en cours de mise en place dans le cadre du programme d’investissements d’avenir (PIA) lancé par le gouvernement français. S’appuyant entre autres sur l’existant des centres de ressources CNRTL (Centre National de Ressources Textuelles et Lexicales : www.cnrtl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012